Decision Tree Discovery

نویسندگان

Ron Kohavi

Ross Quinlan

چکیده

We describe the two most commonly used systems for induction of decision trees for classiication: C4.5 and CART. We highlight the methods and diier-ent decisions made in each system with respect to splitting criteria, pruning, noise handling, and other diierentiating features. We describe how rules can be derived from decision trees and point to some diierence in the induction of regression trees. We conclude with some pointers to advanced techniques, including ensemble methods, oblique splits, grafting, and coping with large data. C4.5 belongs to a succession of decision tree learners that trace their origins back to the work of Hunt and others in the late 1950s and early 1960s (Hunt 1962). Its immediate predecessors were ID3 (Quinlan 1979), a simple system consisting initially of about 600 lines of Pascal, and C4 (Quinlan 1987). C4.5 has grown to about 9,000 lines of C that is available on diskette with Quinlan (1993). Although C4.5 has been superseded by C5.0, a commercial system from RuleQuest Research, this discussion will focus on C4.5 since its source code is readily available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Approach for Knowledge Discovery in Decision Trees using Inter Quartile Range Transform

Data mining and knowledge discovery is used for discovery of hidden knowledge from large data sources. Decision trees are one of the most famous classification techniques with simple and efficient generalization technique. This paper presents a new decision tree algorithm IQ Tree for classification problem. The IQ Tree assumes using an inter quartile range conversion of attributes with C4.5 as ...

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

متن کامل

Knowledge Discovery Process for Description of Spatially Referenced Clusters

Spatial clustering is an important field of spatial data mining and knowledge discovery that serves to partition a spatial data set to obtain disjoint subsets with spatial elements that are similar to each other. Existing algorithms can be used to perform three types of cluster analyses, including clustering of spatial points, regionalization and point pattern analysis. However, all these exist...

متن کامل

Applying Data Mining Techniques in Property/Casualty Insurance

This paper addresses the issues and techniques for Property/Casualty actuaries using data mining techniques. Data mining means the efficient discovery of previously unknown patterns in large databases. It is an interactive information discovery process that includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the ...

متن کامل

Generating a mortality model from a pediatric ICU (PICU) database utilizing knowledge discovery

Current models for predicting outcomes are limited by biases inherent in a priori hypothesis generation. Knowledge discovery algorithms generate models directly from databases, minimizing such limitations. Our objective was to generate a mortality model from a PICU database utilizing knowledge discovery techniques. The database contained 5067 records with 192 clinically relevant variables. It w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Decision Tree Discovery

نویسندگان

چکیده

منابع مشابه

An Efficient Approach for Knowledge Discovery in Decision Trees using Inter Quartile Range Transform

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

Knowledge Discovery Process for Description of Spatially Referenced Clusters

Applying Data Mining Techniques in Property/Casualty Insurance

Generating a mortality model from a pediatric ICU (PICU) database utilizing knowledge discovery

عنوان ژورنال:

اشتراک گذاری